Algorithms for Data Science : Lecture 4 Barna Saha

نویسنده

  • Barna Saha
چکیده

We have seen the Chernoff+Union bound in action in the previous section when we analyzed the outcome of reservoir sampling for items in [1, 100] over m iterations. There the bad event Badi represents the event that item i is not sampled in the range m 100 ± m 200 . Using the Chernoff bound, for each i Pr[Badi] is minuscule. Therefore, the probability that at least one of the bad event happens which will leave us unconvinced about the uniformity of reservoir sampling is at most 100 ∗minuscule = small, by taking union bound over the 100 bad events. Lets take another example, and follow the above argument rigorously.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Algorithms for Data Science: Lecture on Finding Similar Items

Finding similar items is a fundamental data mining task. We may want to find whether two documents are similar to detect plagiarism, mirror websites, multiple versions of the same article etc. Finding similar items is useful for building recommender systems as well where we want to find users with similar buying patterns. In Netflix two movies can be deemed similar if they are rated highly by t...

متن کامل

Algorithms for Data Science : Lecture 3

1 Concentration Inequalities Lemma 1 (Markov’s inequality). Let X be a non-negative random variable. For all λ > 0, Pr[X > λ] ≤ E[X] λ Lemma 2 (Chebyshev Inequality). For all λ > 0, Pr[|X −E[X]| > λ] ≤ var[X] λ2 Lemma 3 (The Chernoff Bound: Upper bound). Let X1, X2, ..., Xn be independent random variables taking values in {0, 1} with E[Xi] = pi. Let X = ∑n i=1Xi, and μ = E[X]. Then the followin...

متن کامل

Algorithms for Data Science: Lecture on Clustering

Given a set of points with a notion of distance between points, group the points into some number of clusters so that members of a cluster are “close” to each other, while members of different clusters are far. The problem of clustering is ubiquitous. We may want to cluster documents by topic they represent, we may want to cluster the moviegoers by the types of movies they like, or cluster gene...

متن کامل

Algorithms for Data Science: Lecture 5

The balls-and-bins exercise that we did in Homework 1 is also useful for modeling Hashing. A hash function h from a universe U = [0, 1, .., m− 1] into a range [0, ..., n− 1] can be thought of as a way of placing items from the universe into n bins. The collection of bins is called a hash table. We can model the distribution of items in bins with the same distribution as m balls placed randomly ...

متن کامل

CMSC 858F: Algorithmic Game Theory Fall 2010 Frugality & Profit Maximization in Mechanism Design

Recall from the previous lecture that in combinatorial auction each bidder has an associated real-valued valuation function V defined for each subset of items S. An allocation of items S1, S2, . . . , Sn among the bidders with valuation function V1, V2, . . . , Vn respectively is socially efficient if the allocation maximizes the social welfare ∑ i Vi(Si). Combinatorial auction is a very genera...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016